Motivation for Effective Rank

Rank is a fundamental concept in linear algebra, representing the intrinsic dimensionality of a matrix. However, the strict mathematical definition of rank often does not fully align with numerical computation scenarios. Mathematically, rank equals the number of nonzero singular values, but the understanding of "equal to zero" differs between mathematics and numerical computation. In mathematics, "equal to zero" is absolute and strict—even $10^{-100}$ is not zero. In numerical computation, however, $10^{-10}$ is often treated as zero.

Therefore, we seek to extend the concept of rank to a form more suitable for numerical computation characteristics. This gives rise to the concept of Effective Rank.

Error Truncation#

No Unified Definition

It should be noted that there is currently no unified definition of effective rank in academia. What follows are several approaches to defining effective rank from different perspectives. For practical problems, readers can choose the definition most suitable for their application.

We primarily consider rank from the perspective of singular values. For matrix $\boldsymbol{M}\in\mathbb{R}^{n\times m}$, without loss of generality, assume $n\leq m$. Its standard rank is:

(1) \[ \mathop{\text{rank}}(\boldsymbol{M}) \triangleq \max\{i\,|\,\sigma_i > 0\}\leq n \]

where $\sigma_1\geq \sigma_2\geq \cdots\geq\sigma_n \geq 0$ are the singular values of $\boldsymbol{M}$. Intuitively, the concept of effective rank aims to treat singular values close to zero as zero. Therefore, a basic property of effective rank is that it does not exceed the standard rank and can degenerate to the standard rank in special cases. A simple definition satisfying this property is:

(2) \[ \mathop{\text{erank}}(\boldsymbol{M},\epsilon) \triangleq \max\{i\,|\,\sigma_i > \epsilon\} \]
Relative vs Absolute Thresholds

However, the concepts of "large" and "small" should be relative. 1 is large relative to 0.01 but small relative to 100. Thus, normalizing by $\sigma_1$ appears more scientific:

(3) \[ \mathop{\text{erank}}(\boldsymbol{M},\epsilon) \triangleq \max\big\{i\,\big|\,\sigma_i/\sigma_1 > \epsilon\big\} \]

Beyond directly truncating singular values close to zero, we can also consider this problem from the perspective of low-rank approximation. In "The Path to Low-Rank Approximation (Part 2): SVD", we proved that the optimal rank-$r$ approximation of a matrix is the SVD result retaining only the largest $r$ singular values. Conversely, we can specify a relative error $\epsilon$ and define effective rank as the minimum rank achieving this relative error:

(4) \[ \mathop{\text{erank}}(\boldsymbol{M},\epsilon) \triangleq \min\left\{ i\,\,\left|\,\,\sqrt{\left(\sum_{i=1}^r \sigma_i^2\right)\left/\left(\sum_{i=1}^n \sigma_i^2\right.\right)} \geq 1-\epsilon\right.\right\} \]

This definition has clearer numerical meaning, but it only considers overall error, making some examples less elegant. For instance, for an $n\times n$ identity matrix, we expect its effective rank to always be $n$, because each singular value is equally 1, and there should be no truncation behavior. However, using the above definition, when $n$ is sufficiently large ($1/n < \epsilon$), the effective rank becomes less than $n$.

Norm Ratios#

Although the effective rank definitions in the previous section are intuitive, they all depend on a hyperparameter $\epsilon$, ultimately lacking simplicity. Our basic understanding now is that the concept of effective rank can only rely on relative magnitudes. Thus, the problem reduces to constructing effective rank from $1\geq \sigma_2/\sigma_1\geq\cdots\geq\sigma_n/\sigma_1\geq 0$. Given that all values lie in $[0,1]$, a clever idea is to directly sum them:

(5) \[ \mathop{\text{erank}}(\boldsymbol{M}) \triangleq \sum_{i=1}^n\frac{\sigma_i}{\sigma_1} \]
Matrix Norm Interpretations

From "The Path to Low-Rank Approximation (Part 2): SVD", we know that the largest singular value $\sigma_1$ is the matrix's spectral norm, denoted $\Vert\boldsymbol{M}\Vert_2$. The sum of all singular values is also a matrix norm called the "nuclear norm," typically denoted $\Vert\boldsymbol{M}\Vert_*$. Thus, the above expression can be simplified as:

(6) \[ \mathop{\text{erank}}(\boldsymbol{M}) \triangleq \frac{\Vert\boldsymbol{M}\Vert_*}{\Vert\boldsymbol{M}\Vert_2} \]

This may originate from "An Introduction to Matrix Concentration Inequalities", where it is called "Intrinsic Dimension." However, related properties were already explored in "Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization".

Similarly, we can construct effective rank by summing squares:

(7) \[ \mathop{\text{erank}}(\boldsymbol{M}) \triangleq \sum_{i=1}^n\frac{\sigma_i^2}{\sigma_1^2} = \frac{\Vert\boldsymbol{M}\Vert_F^2}{\Vert\boldsymbol{M}\Vert_2^2} \]

Here $\Vert\cdot\Vert_F$ is the Frobenius norm. This definition likely originates from "Sampling from large matrices: an approach through geometric functional analysis", where it was called "Numerical Rank" and is now more commonly referred to as "Stable Rank," one of the more popular effective rank concepts.

Computational Considerations

From a computational complexity perspective, Equation $\eqref{eq:f-2}$ (the squared version) has lower complexity than Equation $\eqref{eq:n-2}$ (the unsquared version). Computing the nuclear norm requires all singular values, meaning full SVD is needed. However, $\Vert\boldsymbol{M}\Vert_F^2$ equals the sum of squares of all matrix elements, so the main computational cost for Equation $\eqref{eq:f-2}$ is the maximum singular value, which is cheaper than computing all singular values. If desired, we can generalize Equations $\eqref{eq:n-2}$ and $\eqref{eq:f-2}$ to $k$-th power sums; the results are essentially more general Schatten norm to spectral norm ratios.

Distribution and Entropy#

If readers search for "Effective Rank," they will likely find the paper "The Effective Rank: a Measure of Effective Dimensionality", which is also an early literature exploring effective rank and proposes defining effective rank based on entropy.

Probability Distribution from Singular Values

First, since singular values are always non-negative, we can normalize them to obtain a probability distribution:

(8) \[ p_i = \frac{\sigma_i^{\gamma}}{\sum_{j=1}^n \sigma_j^{\gamma}} \]

where $\gamma > 0$. From the literature, $\gamma=1$ and $\gamma=2$ are both frequently used (we used $\gamma=2$ in Moonlight). Below, we take $\gamma=1$ as an example. With a probability distribution, we can compute the information entropy (Shannon entropy):

(9) \[ H = -\sum_{i=1}^n p_i \log p_i \]

Recall that the range of entropy is $[0, \log n]$. After exponentiation, we have $e^H \in [1, n]$. When the distribution is one-hot (only one nonzero singular value), $e^H=1$; when the distribution is uniform (all singular values equal), $e^H=n$. These are precisely the two special cases of standard rank. This inspires us to define effective rank as:

(10) \[ \mathop{\text{erank}}(\boldsymbol{M}) \triangleq e^H = \exp\left(-\sum_{i=1}^n p_i \log p_i\right) \]

Substituting the definition of $p_i$, we find it can be further transformed into:

(11) \[ \mathop{\text{erank}}(\boldsymbol{M}) = \exp\left(\log\sum_{i=1}^n \sigma_i -\frac{\sum_{i=1}^n \sigma_i\log\sigma_i}{\sum_{i=1}^n \sigma_i}\right) \]

Clearly, the first term inside the parentheses after exponentiation is $\Vert\boldsymbol{M}\Vert_*$; the second term is the weighted average of $\log\sigma_i$ with weights $\sigma_i$. At this point, $\log\sigma_i$ will approximately equal the largest $\log\sigma_1$, and after exponentiation becomes $\sigma_1=\Vert\boldsymbol{M}\Vert_2$. Therefore, the overall result of the above expression will approximate Equation $\eqref{eq:n-2}$. This indicates that defining effective rank based on entropy appears to be a completely different path but actually shares remarkable similarities with the norm ratio approach from the previous section.

Mathematical Properties

We know that standard rank satisfies the triangle inequality $\mathop{\text{rank}}(\boldsymbol{A}+\boldsymbol{B})\leq \mathop{\text{rank}}(\boldsymbol{A}) + \mathop{\text{rank}}(\boldsymbol{B})$. The original paper proved that for (semi)positive definite symmetric matrices $\boldsymbol{A},\boldsymbol{B}$, the effective rank defined by Equation $\eqref{eq:h-erank}$ satisfies $\mathop{\text{erank}}(\boldsymbol{A}+\boldsymbol{B})\leq \mathop{\text{erank}}(\boldsymbol{A}) + \mathop{\text{erank}}(\boldsymbol{B})$. It remains unclear whether this inequality can be extended to general matrices. Currently, proving whether effective rank can preserve some inequalities of standard rank appears to be a non-trivial task.

Sparsity Indices#

Connection to Sparsity Measures

From the series of effective rank definitions above, especially the transition from singular values to distributions to entropy, readers may already vaguely realize that effective rank shares obvious commonalities with sparsity. In fact, effective rank can be understood as a sparsity measure for the singular value vector. The difference from general sparsity indicators is that we align the range to $1\leq \mathop{\text{erank}}(\boldsymbol{M}) \leq \mathop{\text{rank}}(\boldsymbol{M}) \leq n$, making it consistent with the concept of rank and thereby providing more intuitive perception of sparsity level.

Regarding sparsity measures, we previously conducted a systematic discussion in "How to Measure Data Sparsity?". Theoretically, the results from that article can all be used to construct effective rank. In fact, this is what we have done. In the "Norm Ratios" section, constructing effective rank based on the ratio of Schatten norm to spectral norm essentially corresponds only to Equation $(1)$ in that article. We can also use other formulas such as Equation $(16)$, which is equivalent to defining effective rank as the square of the ratio of nuclear norm to Frobenius norm:

(12) \[ \mathop{\text{erank}}(\boldsymbol{M}) \triangleq \frac{\Vert\boldsymbol{M}\Vert_*^2}{\Vert\boldsymbol{M}\Vert_F^2} = \frac{(\sum_{i=1}^n\sigma_i)^2}{\sum_{i=1}^n\sigma_i^2} \]

This also satisfies $1\leq \mathop{\text{erank}}(\boldsymbol{M}) \leq \mathop{\text{rank}}(\boldsymbol{M})$ and is a usable effective rank definition.

Epistemological Insight

Remarkably, our understanding of effective rank and sparsity has, imperceptibly, formed a closed loop. This is undoubtedly a beautiful experience: when I began learning about sparsity measures, I knew nothing about effective rank. While learning about effective rank these past few days, I gradually realized its fundamental connection with sparsity. It seems as if some mysterious force has quietly connected the knowledge we accumulated in different fields and disciplines, ultimately converging in the same correct direction.

Article Summary#

This article explored the concept of Effective Rank for matrices, which extends the linear algebra concept of matrix rank for numerical computation, providing a more effective measure of a matrix's intrinsic dimensionality.

Key contributions and insights:

  1. Motivation: Standard mathematical rank definitions fail in numerical contexts where near-zero singular values should be treated as zero.
  2. Threshold Approaches: Absolute and relative thresholding methods provide intuitive but hyperparameter-dependent definitions.
  3. Norm Ratios: Stable rank (Frobenius-to-spectral norm ratio) and nuclear-to-spectral norm ratio offer parameter-free alternatives with computational advantages.
  4. Entropy Methods: Information-theoretic approaches based on singular value distributions reveal connections to norm-based definitions.
  5. Sparsity Connection: Effective rank emerges as a specialized sparsity measure for singular value vectors, linking disparate mathematical concepts.
  6. Computational Trade-offs: Different definitions offer varying balances between mathematical elegance, computational cost, and practical utility.

The article demonstrates that while no single definition of effective rank is universally accepted, the various approaches collectively enrich our understanding of matrix structure in numerical contexts and provide practical tools for analyzing matrix dimensionality in computational applications.

Citation Information

Original Article: Su Jianlin. Effective Rank of Matrices. Scientific Spaces.

How to cite this translation:

Su, J. Effective Rank of Matrices [Translated by ScalingOpt Team]. Scientific Spaces.

BibTeX:

@article{su2025effective_rank, title = {Effective Rank of Matrices}, author = {Su, Jianlin}, journal = {Scientific Spaces}, year = {2025}, url = {https://kexue.fm/archives/10569}, note = {Translated by ScalingOpt Team} }